SMILES2vec

نویسندگان

  • Garrett B. Goh
  • Nathan O. Hodas
  • Charles Siegel
  • Abhinav Vishnu
چکیده

Chemical databases store information in text representations, and the SMILES format is a universal standard used in many cheminformatics software. Encoded in each SMILES string is structural information that can be used to predict complex chemical properties. In this work, we develop SMILES2vec, a deep RNN that automatically learns features from SMILES strings to predict chemical properties, without the need for additional explicit chemical information, or the “grammar” of how SMILES encode structural data. Using Bayesian optimization methods to tune the network architecture, we show that an optimized SMILES2vec model can serve as a general-purpose neural network for learning a range of distinct chemical properties including toxicity, activity, solubility and solvation energy, while outperforming contemporary MLP neural networks that uses engineered features. Furthermore, we demonstrate proof-of-concept of interpretability by developing an explanation mask that localizes on the most important characters used in making a prediction. When tested on the solubility dataset, this localization identifies specific parts of a chemical that is consistent with established first-principles knowledge of solubility with an accuracy of 88%, demonstrating that neural networks can learn technically accurate chemical concepts. The fact that SMILES2vec validates established chemical facts, while providing state-of-the-art accuracy, makes it a potential tool for widespread adoption of interpretable deep learning by the chemistry community.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Smiles2vec: Predicting Chemical Properties from Text Representations

Chemical databases store information in text representations, and the SMILES format is a universal standard used in many cheminformatics software. Encoded in each SMILES string is structural information that can be used to predict complex chemical properties. In this work, we develop SMILES2vec, a deep RNN that automatically learns features from SMILES strings to predict a broad range of chemic...

متن کامل

Gains from diversification on convex combinations: A majorization and stochastic dominance approach

By incorporating both majorization theory and stochastic dominance theory, this paper presents a general theory and a unifying framework for determining the diversification preferences of risk-averse investors and conditions under which they would unanimously judge a particular asset to be superior. In particular, we develop a theory for comparing the preferences of different convex combination...

متن کامل

Improved immunogenicity of tetanus toxoid by Brucella abortus S19 LPS adjuvant.

BACKGROUND Adjuvants are used to increase the immunogenicity of new generation vaccines, especially those based on recombinant proteins. Despite immunostimulatory properties, the use of bacterial lipopolysaccharide (LPS) as an adjuvant has been hampered due to its toxicity and pyrogenicity. Brucella abortus LPS is less toxic and has no pyrogenic properties compared to LPS from other gram negati...

متن کامل

Steady electrodiffusion in hydrogel-colloid composites: macroscale properties from microscale electrokinetics.

A rigorous microscale electrokinetic model for hydrogel-colloid composites is adopted to compute macroscale profiles of electrolyte concentration, electrostatic potential, and hydrostatic pressure across membranes that separate electrolytes with different concentrations. The membranes are uncharged polymeric hydrogels in which charged spherical colloidal particles are immobilized and randomly d...

متن کامل

Perturbative Analysis of Dynamical Localisation

In this paper we extend previous results on convergent perturbative solutions of the Schrödinger equation of a class of periodically timedependent two-level systems. The situation treated here is particularly suited for the investigation of two-level systems exhibiting the phenomenon of (approximate) dynamical localisation. We also present a convergent perturbative expansion for the secular fre...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017